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Abstract — Information-efficient approaches for extracting ran- 
domness from imperfect sources have been extensively studied, 
but simpler and faster ones are required in the high-speed ap- 
plications of random number generation. In this paper, we focus 
on linear constructions, namely, applying linear transformation 
for randomness extraction. We show that linear transformations 
based on sparse random matrices are asymptotically optimal 
to extract randomness from independent sources and bit-fixing 
sources, and they are efficient (may not be optimal) to extract 
randomness from hidden Markov sources. Further study demon- 
strates the flexibility of such constructions on source models 
as well as their excellent information-preserving capabilities. 
Since linear transformations based on sparse random matrices 
are computationally fast and can be easy to implement using 
hardware like FPGAs, they are very attractive in the high-speed 
applications. In addition, we explore explicit constructions of 
transformation matrices. We show that the generator matrices of 
primitive BCH codes are good choices, but linear transformations 
based on such matrices require more computational time due to 
their high densities. 

Index Terms — Randomness Extraction, Linear Transforma- 
tions, Sparse Random Matrices. 



I. Introduction 

RANDOMNESS plays an important role in many fields, 
including complexity theory, cryptography, information 
theory and optimization. There are many randomized algo- 
rithms that are faster, more space efficient or simpler than any 
known deterministic algorithms [18]; hence, how to generate 
random numbers becomes an essential question in computer 
science. Pseudo-random numbers have been studied, but they 
cannot perfectly simulate truly random bits or have security 
issues in some applications. These problems motivate people 
to extract random bits from natural sources directly. In this pa- 
per, we study linear transformation for randomness extraction. 
This approach is attractive due to its computational simplicity 
and information efficiency. Specifically, given an input binary 
sequence X of length n generated from an imperfect source, 
we construct annxra binary matrix M called a transformation 
matrix such that the output sequence 

Y = XM 

is very close to the uniform distribution on {0, l} m . Statistical 
distance [26| is commonly used to measure the distance 
between two distributions in randomness extraction. We say 
Y E {0, l} m is e-close to the uniform distribution U m on 
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{0, l} m if and only if 

\ ]T \P[Y = y]-2-™ \<e, 



(1) 



' ye{o,i}" 



where e > can be arbitrarily small. This condition guarantees 
that in any probabilistic application, if we replace truly random 
bits with the sequence Y, the additional error probability 
caused by the replacement is at most e. 

The classical question in randomness extraction considers 
ideal sources, like biased coins or Markov chains. From such 
sources, the bits extracted can be perfectly random that means 
independent and unbiased. It dates back to von Neumann ll30ll 
who first considered the problem of simulating an unbiased 
coin by using a biased coin with unknown probability. His 
beautiful algorithm was later improved by Elias [ 8 1 and Peres 
[20|. In 1986, Blum Q studied the problem of generating 
random bits from a correlated source, specifically, he consid- 
ered finite Markov chains. Recently, we generalized Blum's 
method and proposed the first known algorithm that runs in 
expected linear time and achieves the information-theoretic 
upper bound on efficiency PP . Although it is known how to 
extract random bits optimally from biased coins or Markov 
chains, these models are too narrow to describe real sources 
that suffer noise and disturbance. 

During last two decades, research has been focused on a 
general source model called fc-sources 11331 , in which each 
possible sequence has probability at most 2~ k of being gen- 
erated. This model can cover a very wide range of natural 
random sources, but it was shown that it is impossible to derive 
a single function that extracts even a single bit of randomness 
from such a source. This observation led to the introduction of 
seeded extractors, which use a small number of truly random 
bits as the seed (catalyst). When simulating a probabilistic 
algorithm, one can simply eliminate the requirement of truly 
random bits by enumerating all possible strings for the seed 
and taking a majority vote. There are a variety of very efficient 
constructions of seeded extractors, summarized in [7|, |19|, 
[26|. Although seeded extractors are information-efficient and 
applicable to most natural sources, they are not computation- 
ally fast when simulating probabilistic algorithms. Recently, 
there is renewed interest in designing seedless extractors, 
called deterministic extractors. Several specific classes of 
sources have been studied, including independent sources, 
which can be divided into several independent parts consisting 
of certain amounts of randomness J3), ||2T1 - Il23l : bit-fixing 
sources, where some bits in a binary sequence are truly random 
and the remaining bits are fixed 0, iflOll . Ifl2l : and samplable 
sources, where the source is generated by a process that has a 
bounded amount of computational resources like space [13|, 
ED. 
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Unlike prior works on deterministic extractors, we take 
both simplicity and efficiency into consideration. Simplicity 
is certainly an important issue; for example, it motivates 
the use of von Neumann's scheme [30] in Intel's random 
number generator (RNG) fiTl rather than some other more 
sophisticated extractors. However, von Neumann's scheme is 
far from optimal in its efficiency, and it only works for ideal 
biased coins. Recently, in order to support future generations 
of hardware security in systems operating at ultrafast bit rates, 
many high-speed random number generators based on chaotic 
semiconductor lasers have been developed ll28l . They can 
generate random bits at rates as high as 12.5 — 400 Gbit/s 
[2], |14|, |24|; hence, the simplicity of post-processing is 
becoming more important. These challenges motivate us to 
develop extractors that can extract randomness from natural 
sources in a manner that reaches the theoretical upper bound 
on efficiency without compromising simplicity. In particular, 
we focus on linear constructions; that is, we apply linear 
transformations for randomness extraction. 

Our main contribution is to show that linear transformations 
based on sparse random matrices are asymptotically optimal 
for extracting randomness from independent sources and bit- 
fixing sources, and they are efficient (although not necessar- 
ily optimal) for extracting randomness from hidden Markov 
sources. We further show that these conclusions hold if we 
apply any invertible linear mapping on the sources. In fact, 
many natural sources for the purpose of high-speed random 
number generation are qualified to fit one of the above models 
or their mixture, making the construction based on sparse 
random matrices very attractive in practical use. The resulting 
extractors are not seeded extractors, which consume truly ran- 
dom bits whenever extracting randomness. They are, in some 
sense, probabilistic constructions of deterministic extractors. In 
addition, we explore explicit constructions of transformation 
matrices. We show that the generator matrices of primitive 
BCH codes are good choices, but linear transformations based 
on such matrices require more computational time due to their 
high densities. 

The remainder of this paper is organized as follows. In 
Section [TT] we give an intuitive overview of linear transfor- 
mations for randomness extraction and present some general 
properties. In Section [Till we introduce the source models 
to be addressed in this paper and briefly describe our main 
results. The detailed discussions for each source model, includ- 
ing independent sources, hidden Markov sources, bit-fixing 
sources and linear-subspace sources, are given in Section [IV] 
Section[V] SectionlVlland Section [VTll respectively. In Section 
IVIIII we briefly describe implementation issues followed by 
concluding remarks in Section [IX] 

II. Linear Transformations 

Let us start from a simple and fundamental question 
in random number generation: given a set of coin tosses 
X\,X2, —,x„ with P[xi = 1] E [| — 5, ~ + S], how can 
we simulate a single coin toss such that is as unbiased as 
possible? This question has been well studied and it is known 
that binary sum operation is optimal among all the methods, 



i.e., we generate a bit z which is 

z = x\ + X2 + •■■ + x n mod 2. 

The following lemma shows that binary sum operation can 
decrease the bias of the resulting coin toss exponentially. 

Lemma 1. 11 151 Let Xi,X2,--,x n be n independent bits and 
the bias of xt is 5i, namely, 

Si = \P[ Xi = 1] - \\ 

for 1 < i < n, then the bias of z = X\ +X2 + ■■■ + x n mod 2 
is upper bounded by 

nr=i(2*o 
2 

A generalization of the above question is that: given n 
independent bits, how do we generate m < n random bits such 
that their statistical distance to the truly random bits is as small 
as possible? One way is to divide all the n independent bits 
into m nonoverlap groups, denoted by Si, S2, S m , such 
that Ufci & ~ i 2 -!' x 2, •••) Xn}. For 1 < i < m, the ith 
output bit, denoted by j/j, is produced by summing up the 
bits in Si and modulo two. However, this method is not very 
efficient. By allowing overlaps between different groups, the 
efficiency can be significantly improved. In this case, although 
we have sacrificed a little independence of the output bits, but 
the bias of each bit has been reduced a lot. An equivalent 
way of presenting this method is to use a binary matrix, 
denoted by M, such that My = 1 if and only if Xi E Sj, 
otherwise, My = 0. As a result, the output of this method 
is Y = XM for a given input sequence X. This is an 
intuitive understanding why linear transformations can be used 
in random extraction from weak random sources, in particular, 
from independent sources. 

In this paper, we study linear transformations for extracting 
randomness from a few types of random sources. Given a 
source X E {0, 1}™, we design a transformation matrix M 
such that the output Y — XM is arbitrarily close to truly 
random bits. Here, we use the statistical distance between 
Y and the uniform distribution over {0, l}" 1 to measure the 
goodness of the output sequence Y, defined by 

pV) = \ E \P[y = v]-^ m \- (2) 

ye{o,i} m 

It indicates the maximum error probability introduced by 
replacing truly random bits with the sequence Y in any 
randomized algorithm. 

Given a random source X and a matrix M, the following 
lemma shows an upper bound of p(XM). 

Lemma 2. Let X = x\x%...x n be a binary sequence gener- 
ated from an arbitrary random source and let M be an n x m 
binary matrix with m < n. Then given Y = XM, we have 

P(Y)< ]T \P x [XMu T = l]-\\. 

JiG{0,l} m ,u#0 
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Proof: Similar as the idea in |[T5l , for all y g {0, 
we define function h as h(y) = P(Y = y). For this function, 
its Fourier transform is denoted by Fh, then 



Vy e {0, l} m , %) 



E ^(")Mr 

«e{o,i} m 



and 



V M e{0,l} m ,F, l ( u )= E MM- 1 )"'"- 

yG{0,l}'" 



When u = 0, we have 



\F h (u)\= J2 %) = L 
ye{o,i} m 



When u ^ 0, we have 



i E %x-i)n 

yG{0,l} m 

= i E %)- E %)i 

y-u—0 y-u—1 

= |l-2 e %)i 

= 2|P[XA/ M T = l]-i|. (3) 
Substituting ([3} into leads to 

P (Y) = l e i 2 ~ m E ^(«)(~ir u -2-™i 

yG{0,l} m u£{0,l} m 

^ 5 E 2-™eiwi 

ye{o,i} m 



= ^Ei^wi 

< Ei p [™ T = 1 ]-Ji 



(4) 



«#0 

This completes the proof. 



There are some related works focusing on the constructions 
of linear transformations for the purpose of randomness ex- 
traction. In |fl6l . Lacharme studied linear correctors, and his 
goal is to generate a random sequence Y of length m such 
that 

max \P\Y = y] - 2" m | < e 

for a specified small constant e. At almost the same time as our 
work, in (TJ, Abbe uses polar codes to construct deterministic 
extractors. His idea is that given an independent sequence 

X and let X' = XG n with G n = [ 1 !? ]© lo S2™, then a 

subset of components in X' are roughly i.i.d. uniform and 
the remaining components are roughly deterministic. It was 
proved that this approach can generate a random sequence Y 
of length m and with entropy at least m(l — e). In both of 
the works above, the random bits generated are 'weaker' than 
the requirement of statistical distance. For instance, let Y be 
a random sequence of length m, and assume P[Y = y] with 



y € {0, 1}™ is either 2 ( m ^ or 0. In this case, as m — > oo, 
we have 

^0; 



max \P\Y = yl-2" 



1 - 



H(Y) 



1 

m 



0. 



That means this sequence Y satisfies the requirement of 
randomness in both of the works. But if we consider the 
statistical distance of Y to the uniform distribution on {0, 1}™, 
it is 

P( Y ) = 2 £ \P[Y = y}-2-™\ = ~. 

y£{0,l}™ 

That does not satisfy our requirement of randomness in the 
sense of statistical distance. From this point, we generate 
random bits with higher requirement on quality than the above 
works. 

In the rest of this paper, we investigate those random sources 
on {0, 1}™ such that by applying linear transformations we can 
get a random sequence Y with p(Y) — > as n — » oo. 

III. Source Models and Main Results 

In this section, we introduce a few types of random sources 
including independent sources, hidden Markov sources, bit- 
fixing sources, and linear-subspace sources, and we summarize 
our main results for each type of sources. Two constructions 
of linear transformations will be presented and analyzed. The 
first construction is based on sparse random matrices. We say 
a random matrix with each entry being one with probability 
p is sparse if and only if p is small and p = w( 1 -^^) that 
means p > j^-^n for any fixed k > when the source 
length n — > oo. The second construction is explicit - it is 
based on the generator matrices of linear codes with binomial 
weight distributions. The drawback of this construction is that 
it requires more computations than the first one. 

Given a source X, let H m i n (X) denote its min-entropy, 
defined by 

l (X)= min log 27 ^— - — — . (5) 



P[X = x] ' 

For many sources, such as independent sources and bit-fixing 
sources, the number of randomness that can be extracted using 
deterministic extractors is upper bounded by the min-entropy 
of the source asymptotically. Note that this is not always true 
for some special sources when the input sequence is infinitely 
long. For example, we consider a source on {0, 1}" such that 
there is one assignment with probability 2~ "z and all the other 
assignments have probability either 2~™ or 0. For this source, 
its min-entropy is ^, but as n — > oo, this source itself is 
arbitrarily close to the uniform distribution on {0, l} n . 

A. Independent Sources 

Independent sources, where the bits generated are inde- 
pendent of each other, have been studied by Santha and 
Vazirani [25], Varirani [29|, P. Lacharme [16|, etc. We con- 
sider a general model of independent sources, namely, let 
X = x\X2-.x n € {0,1}" be a binary sequence generated 
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from such a source, then x±, X2, x n are independent of 
each other, and all their probabilities are unknown and may be 
different. We assume that this source contains a certain amount 
of randomness, i.e., its min-entropy H m i n (X) is known. 

Theorem 1. Let X = x\x<i...x n £ {0, 1}™ be an independent 
sequence and let M be an n x m binary random matrix in 
which each entry is 1 with probability p — w( og " ) < i 
Assume Y = XM. If H ^\ X ) < 1» as n 00 > pO^) 
converges to in probability, i.e., 

p{Y) 4 0. 

It shows that linear transformations based on sparse random 
matrices are asymptotically optimal for extracting randomness 
from independent sources. To consider explicit constructions, 
we focus on a type of independent sources X — xiX2...x n £ 
{0, 1}™ such that the probability of Xi for all 1 < i < n is 
slightly unpredictable, i.e., 

with a constant e. For such a source, it is possible to have 
min-entropy n log 2 yq^ . The following result shows that we 
can have an explicit construction that can extract as many as 
n log 2 random bits from X asymptotically. 

Theorem 2. Let C be a linear code with dimension m and 
codeword length n. Assume its weight distribution is binomial 
and its generator matrix is G. Let X = x\X2--.x n £ {0, l} n be 
an independent source such that P[xi = 1] £ [!•— e/2, |+e/2] 
for all 1 <i <n, and let Y = XG T . If . m a < 1, as 
n — > 00, we have 

p{Y) -> 0. 

This result shows that if we can construct a linear code 
with binomial weight distribution, it can extract as many 
as n log 2 random bits asymptotically. It is known that 
primitive BCH codes have approximately binomial weight 
distribution. Hence, they are good candidates for extracting 
randomness from independent sources with bounded bias. 

B. Hidden Markov Sources 

A more-useful but less-studied model is a hidden Markov 
source. It is a good description of many natural sources for 
the purpose of high-speed random number generation, such as 
those based on thermal noise or clock drift. Given a binary 
sequence X = x\x<2...x n £ {0,1}™ produced by such a 
source, we let 0i be the complete information about the system 
at time i with 1 < i < n. Examples of this system information 
include the value of the noise signal, the temperature, the 
environmental effects, the bit generated at time i, etc. So the 
bit generated at time i, i.e., Xi, is just a function of 9i. We say 
that this source has the hidden Markov property if and only if 
for all 1 < i < n, 

P[Xi\0i-i,Xi-i,Xi- 2 , = P[Xi\9i-l]. 

That means the bit generated at time i only depends on the 
complete system information at time i — 



To analyze the performance of linear transformations on 
hidden Markov sources, we assume that the external noise of 
the sources is bounded, hence, we assume that for any three 
time points 1 < i\ < i 2 < 13 < n, 

P[z i2 = 11^,^] e [i - |, I + |] (6 ) 

with a constant e. 

Theorem 3. Let X = x\X2---x n be a binary sequence 
generated from a hidden Markov source described above. Let 
M be an nxm binary random matrix in which the probability 
of each entry being 1 is p = w{ og " ) < i. Assume Y = XM. 
If — ; — m 5 < 1, as n becomes large enough, we have that 

n 10 S2 

p(Y) converges to in probability, i.e., 
p{Y) 4 0. 

The following theorem implies that we can also use the 
generator matrices of primitive BCH codes for extracting 
randomness from hidden Markov sources, due to their approx- 
imately binomial weight distributions. 

Theorem 4. Let C be a linear binary code with dimension 
m and codeword length n. Assume its weight distribution is 
binomial and its generator matrix is G. Let X — x\X2--.x n 
be a binary sequence generated from a hidden Markov source 
described above, and let Y — XG T . If —, — 2 — < 1, as 

n — > 00, we have 

p(Y) -> 0. 

Although our constructions of linear transformations are not 
able to extract randomness optimally from hidden Markov 
sources, they have good capabilities of tolerating local cor- 
relations. The gap between their information efficiency and 
the optimality is reasonable small for hidden Markov sources, 
especially considering their constructive simplicity and the fact 
that most of physical sources for high-speed random number 
generation are roughly independent and with a very small 
amount of correlations. 

C. Bit-Fixing Sources 

Bit-fixing sources were first studied by Cohen and Wigder- 
son 0. In an oblivious bit-fixing source X of length n, k bits 
in X are unbiased and independent, and the remaining n — k 
bits are fixed. We also have nonoblivious bit-fixing sources, 
in which the remaining n — k bits linearly depend on the k 
independent and unbiased bits. Such sources were originally 
studied in the context of collective coin flipping 0). Here, we 
say a bit-fixing source for the general nonoblivious case. 

Theorem 5. Let X = x\X2---x n £ {0,1}" be a bit-fixing 
source in which k bits are unbiased and independent. Let M 
be an nxm binary random matrix in which the probability for 
each entry being 1 is p = w( °^ n ) < \- Assume Y = XM. If 
^ < 1, as n becomes large enough, we have that p(Y) = 
with almost probability 1, i.e., 

P M [p(Y)=0] -)■ 1. 
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So sparse random matrices are asymptotically optimal to 
extract randomness from bit-fixing sources. Unfortunately, for 
bit-fixing sources, it is possible to find an efficient and explicit 
construction of linear transformations. 

D. Linear-Subspace Sources 

We generalize the sources described above in the following 
way: Assume X 6 {0, 1}" is a raw sequence that can be writ- 
ten as ZA, where Z E {0, l} fc with k < n is an independent 
sequence or a hidden Markov sequence, and A is an k x n 
unknown matrix with full rank, i.e., it is an invertible matrix. 
Instances of such sources include sparse images studied in 
compressive sensing. We call such sources as linear-subspace 
sources, namely, they are obtained by mapping simpler sources 
into a subspace of higher dimensions. We demonstrate that 
linear transforms based on sparse random matrices can work 
on linear-subspace sources, and any linear invertible operation 
on the sources does not affect the asymptotic performance. 
Specifically, we have the following theorem. 

Theorem 6. Let X = x\Xi...x n € {0, 1}™ be a source such 
that X = ZA in which Z is an independent sequence and A 
is an unknown k x n full-rank matrix. Let M be an n x m 
random matrix such that each entry of M is 1 with probability 
V = wC- 2 ^) < I- Assume Y = XM. If H J™( X) < 1, as 
n — > oo, p(Y) converges to in probability, i.e., 

p(Y) 4 0. 



A similar result holds if Z is a hidden Markov sequence. In 
this case, we only need to replace H m - m (X) with k log 2 y+^> 
where k is the length of Z and e is defined in Equ. (|6j. 

E. Comments 

Compared to fc-sources, the models that we study in this 
paper are more specific. Perhaps, they are not perfect to 
describe some sources like users' operating behaviors or 
English articles. But for most natural sources that are used 
for building high-speed random number generators, they are 
very good descriptions. Based on these models, we can explore 
simpler and more practical algorithms than those designed for 
general fc-sources. In the following sections, we will present 
our technical results in detail for different types of sources 
respectively. 

IV. Independent Sources 

In this section, we study a general independent source X — 
X\X2-..x n € {0,1}™, in which all the bits x%, x%, x n are 
independent of each other and the probability of Xi with 1 < 
i < n can be arbitrary value, i.e., pi E [0, 1]. We can consider 
this source as a biased coin with the existence of external 
adversaries. 

Lemma 3. Given a deterministic extractor f : {0, 1}™ — > 
{0, l} m , as n — > oo, we have p( f(X)) — » for an arbitrary 
independent source X only if 

rn 



where H min (X) is the min-entropy of X. 

Proof: To prove this theorem, we only need to consider 
a source X = x\x?,...x n € {0, l} n such that 

1 



and 



P[ Xl = 1] = -,V1 < i < H min (X), 



P[xi = 1] = 0,VH min (X) <i<n. 



From such a source X, if m > H m i n (X), it is easy to see 
that p(f(X)) > for all n > 0. ■ 

Let us first consider a simple random matrix in which each 
entry is 1 or with probability 1/2 that we call a uniform 
random matrix. Given an independent input sequence X E 
{0,1}™ and an n X m uniform random matrix M, let Y = 
MX E {0, l} m be the output sequence. The following lemma 
provides the upper bound of E[p(Y)\. 

Lemma 4. Let X = x\X2-..x n be an independent sequence 
and M be an n x m uniform random matrix. Then given 
Y = XM, we have 

E M [p(Y)]<2 m - H ^-\ 

Proof: Let p; denote the probability of xi and let Si be 
the bias of Xi, then Si = \pi — ||. 

According to Lemma [U when u / 0, we have 

\P x [XMu T = 1] - i| < H<=iW , ( 7 ) 

where {Mu T )i is the ith element of the vector Mu T . 
Substituting (O into Lemma [2] yields 

p( y ) ^ ^Efl( 2 ^) (M " T)l - w 

u^O i=l 

Now, we calculate the expectation of p(Y), which is 

E M [p(Y)\ (9) 
1 ™ 

u^O i=l 

1 ™ 

= \zZ E Pm[Mu t = v T ]\[{2S^ . (10) 

i i ^Oue{o,i}' 1 i=1 

Since M is a uniform random matrix (each entry is either 
or 1 with probability 1/2), if u ^ 0, Mu T is a random vector 
of length n in which each element is or 1 with probability 
1/2. So for any u/0, 



P M [Mu T = v T ] = T 



As a result, 



E M [p{Y)\ < 2" 1 -"- 1 U( 26 ^' 
i>e{o,i}" i=1 



Hmin(X) 



< 1, 



" 1 

= 2 m - 1 ]J(-+S, l ). 



6 



For the independent sequence X, its min-entropy can be 
written as 

1 



H min (X) = log 2 = 



n™ =1 max(pi, 1 - pi 

= l0S2 niU(iW 



So 



This completes the proof. ■ 

Example 1. Let us consider an independent source X = 
X1X2...X512 <G {0, l} 512 in which 

A i 1 i . 

Pi € , - H 

F L 2 1024' 2 1024 J 

/or all l<i< 512. 

For f/z/s source, its min-entropy is 

512 

ffminPO > -E lQ S2(2 + 1^4) = 226 - 16 ' 
i— 1 

Tfwe t«e a 512 x 180 random matrix in which each entry 
is or 1 with probability 1/2, then according to the above 
lemma, 

E[p{Y)] < 2" 



- 4716 <6.4x 10- 15 . 



r/zaf means that the output sequence is very close to the 
uniform distribution in the sense of statistical distance. 

When n is large enough, we have the following corollary, 
showing that uniform random matrices are capable to extract as 
many as H m i n (X) random bits from an independent source X 
asymptotically with almost probability one. Since H m i D (X) is 
the theoretical upper bound, such an extractor is asymptotically 
optimal on efficiency. 

Corollary 5. Let X e {0, 1}" be an independent sequence 
and let M be an nxm uniform random matrix. Assume Y = 
XM. If H " l ( X ) < 1> as n °°> P^X) converges to in 
probability, i.e., 



P(Y) 



4 



0. 



The above corollary shows that when the length of the 
input sequence n is large, we can extract random bits very 
efficiently from an independent source by simply constructing 
a uniform random matrix. We need to distinguish this method 
from those of seeded extractors that use some additional 
random bits whenever extracting randomness. In our method, 
the matrix is randomly generated but the extraction itself is still 
deterministic, that means we can use the same matrix to extract 
randomness for any number of times without reconstructing it. 
From this point, our method is a 'probabilistic construction of 
deterministic extractors' . 

Although linear transformations based on uniform random 
matrices are very efficient for extracting randomness from 
independent sources, they are not computationally fast due to 
the high density. It is natural to ask whether it is possible to 
decrease the density of Is in the matrices without affecting the 
performance too much. Motivated by this question, we study 



a sparse random matrix M in which each entry is 1 with 
probability p = w( 1 -^) < \, where p = w( lj ^) means 
that p > fcl ° s " for any fixed k when n — > 00. Surprisingly, 
such a sparse matrix has almost the same performance as that 
of a uniform random matrix, namely, it can extract as many 
as H m i n (X) random bits when the input sequence is long 
enough. 

Lemma 6. Let p = uu^-^f^) < \ and let 

log! 

/»= E (™)(^(l + (l-2^))" 
with e > and m — 6(n). As n — > 00, we have 

Up) -+ 0. 

Proof: Since m = 0(n), we can write m — cn with a 
constant c. 

Let us introduce a function F(j), defined by 

F(j) = m j 2- n (l + (l-2p) j ) n 
= c j n j 2- n (l + (l-2p) j ) n . 



Then 



log! 

.Up) < jr F(j). 

i=i 



First, if p = i, as n — > 00, we have 



log j; 
2p 



Up) < E ^ nH ~ n 

lOg - , log i 

< 6 £ o g2 ( c ")^r^2~" 

- 2p 

lOg -!- 2„log 2 (c) j 

< _2_!L2 ™(log„) log 7-™ 

- 2p 
log 7 



2p 



; 2 -e(n) 



0. 



If p < ^, we show that F(j) decreases as j increases for 

1 < .7 < -§jf- when n is large enough. To see this, we show 
that its derivative F'(j) < when for n — > 00. 

F'ti) 

= c?ri log(cn)2-"(l + (1 - 2 P yy n 

+c'nn- n n{l + (1 - 2pY) n - 1 (l - 2 P y log(l - 2p) 
< cJn : >2- n n\og(cn)2- n (l + (1 - 2 P y) n 
(1 - 2 P y \og(l - 2p)n 
1 + 21og(cn) J ' 

So we only need to prove that 

(1 - 2 P y logfl - 2p)n 
1 + — ^ — < 



2 log(cn) 



for n — > 00. 
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Since p < a < | for a constant a, we have 

(1 - 2p)~i < /3 = (1 - 2a)~^ 



where /? is a constant. 
We can also have 



Hence, 



< 1 

< 1 

= 1 - 

< 0. 



log(l - 2p) < ~2p. 

(1 - 2 P y\og(l - 2p)n 
2 log(cn) 

logi 

(1 - 2p)— log(l - 2p)n 



2 log(cn) 
/? lo « e 2pn 
2 log(cn) 

2 log(cn) 



So when p < i and n —> oo, F(j) decreases as j increases 



lo( 



for 1 < j < ■ As a result, when n is large enough, we 
have 



Up) < E F W 

logi 



< 



< 



2p 
logi 



F(l) 

cn(l — p) 7 



2p 

< (cn) 2 (l-p) 



Since 



log/„(p) < 21ogc + 21ogn + nlog(l-p) 

up 

< 2 log c + 2 log n — — 



-oo, 



we can conclude that 



fn(p) "> 



as n — y oo. 

This completes the proof. ■ 

Based on the above lemma, we get the following theorem. 

Theorem [TJ Let X = XiX2---x n G {0, 1}™ be an independent 
sequence and let M be an n x m binary random matrix in 
which each entry is 1 with probability p = w( 1q ^" ) < \- 
Assume Y = XM. If H ™( X ) < ^> as n ~* °°> P(Y) 
converges to in probability, i.e., 



P{Y) 



0. 



Proof: Let us use the same denotations as above. From 
Equ. ([Toll we have 

1 " 
E M [ P (Y)]<-J2 E Pm[Mu t ^v^Xl^T. 



Since M is a random matrix in which each entry is 1 with 
probability p, for a fixed vector u ^ with ||u|| = j, Mu T 
is a random vector where all the entries are independent and 
each entry is 1 with probability pj. Here, according to Lemma 
Q] we have 

P 3 ^[\{i-{i-2pY) 1 l -{i + {i-2 P y)]. 

There are totally (™) vectors for u with ||u|| = j, hence, we 
get 

Em[ P (Y)} 

i m / \ i n 

* |E(7) E (^a + d-wn^r 

j=l / i, e {o,l}" 

1 v-^ / m 



i=l 



2 " VJ 



i=i 



Now, we divide the upper bound of Em[p(Y)} into two 
terms. To do this, we let 



log 
~27J 



7l 



72 



/ \ " 1 

rn / v n i 

e (7)(i+(i-2p)Tnu+^ 



where e can be arbitrarily small, then 

E M [p(Y)}<^- + ^. 

According to Lemma |6j we can get that as n — > oo, if 
p = w(^p-) < i, then 7! — > 0. So we only need to consider 
the second term, that is 



72 



. log 

3~ ~ 2^7 



* E (7)(i+a-^) = ^) B II(5+*) 



Since (1 — 2p) 2 p > e, we can get 

logi 

(1 - 2p)~ < e . 

As a result, 

7, < e (7)a+«) n nc5+*> 

,_logi VJ7 i=l 2 

3 2p 

n 

< 2 m (i+ e rn(2+^) 

< 2 m - Tll °g2( 1 + C )- ff mi„(^) 

Since e can be arbitrary small, if H phn < 1. as n ^ oo, 
it has 

72 0. 

We can conclude that if H m ^ X ) < -^m[p(X)] can be 
arbitrarily small as n — > oo. It implies that A as 

n — > oo. 

This completes the proof. ■ 
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For practical use, we can set some constraints on each 
column of the sparse random matrices. For example, we can 
let the number of ones in each column be a constant k. 
We may also use pseudorandom bits instead of truly random 
bits. In coding theory, many good codes are constructed 
based on randomly generated matrices. Such examples in- 
clude LDPC (low-density parity-check) codes, network coding 
and compressive sensing. While these codes have very good 
performances, efficient decoding algorithms are needed to 
recover the original messages. Compared to those applications, 
randomness extraction is a one-way process that we do not 
need to reconstruct input sequences (we also cannot do this due 
to the entropy loss). This feature makes linear transformations 
based on random matrices very attractive in the applications 
of randomness extraction. 

In the rest of this section, we study deterministic approaches 
for constructing linear transformations. Here, we focus on a 
type of independent sources that have been studied in [16|, 
[25 1, [29], and we call them independent sources with bounded 
bias. Let X = x\X2...x n € {0,1}" be an independent 
sequence generated from such a source, then the probability 
of Xi for all 1 < i < n is slightly unpredictable, namely, 



Pi 



P[x t = l] e [ 



l 



e 1 
2' 2 



for a constant e with < e < 1. 

The following theorem shows that if the weight distribution 
of a linear code is binomial, then the transpose of its generator 
matrix is a good candidate for extracting randomness from 
independent sources with bounded bias. 

Theorem |2j Let C be a linear code with dimension m and 
codeword length n. Assume its weight distribution is binomial 
and its generator matrix is G. Let X — X\Xi...x n € {0, 1}™ be 
an independent source such that P[x, = 1] € [|— e/2, |+e/2] 
for all 1 <i <n, and let Y = XG T . If __,_ m , < 1, as 



oo, we have 



= 2 — 



p(Y) -> 0. 



Proof: Following Equ. (0 in the proof of Theorem |4] we 



get 



P (Y) < lY, eW{(u 

_ ± \ " 2 m AiZ 

O ' ' on 



G) ) 



2 ^ 2* 

i=i 

< 2 m ~ n - 1 (l + e) n . 
Then it is easy to see that if , m .> < 1. as n —¥ oo, we 

n log 2 xqrj 

have 

p(Y) -> 0. 

This completes the proof. ■ 

According to the theorem above, as n becomes large 
enough, we can extract as many as n log 2 ( ) random bits 
based on the generator matrix of a linear code with binomial 



weight distribution. Note that the min-entropy of the source is 
possible to be 

2 

#minP0 = nl0g 2 (— — ), 

1 + e 

which can be achieved when pi = i + | for all 1 < i < n. 
Hence, this construction is as efficient as that based on random 
matrices, both asymptotically optimal. 

It turns out that the generator matrices of primitive BCH 
codes are good candidates. For a primitive BCH code of length 
2 k — 1, it is known that the weight distribution of the code 
is approximately binomial, see theorem 21 and 23 in lTT7ll . 
Namely, the number hi of codewords of weight i is 



hi = a 



1 



(l + Ei), 



where a is a constant, and the error term Ei tends to zero as 
k grows. 

We see that for the uniform random matrices (with each 
entry being or 1 with probability 1 /2), their weight distribu- 
tions are binomial in expectation; for sparse random matrices 
and primitive binary BCH codes, their weight distributions are 
approximately binomial. Binomial weight distribution is one 
of important features for 'good' matrices, based on which one 
can extract randomness efficiently from independent sources. 

V. Hidden Markov Sources 

A generalized model of an independent source are a hid- 
den Markov source. Given a hidden Markov source X = 
X\Xi...x n € {0, l} n , let Qi be the complete information about 
the system at time i with 1 < i < n. Examples of this 
system information include the value of the noise signal, the 
temperature, the environmental effects, the bit generated at 
time i, etc. So the bit generated at time i, i.e., Xi, is just 
a function of 8i. We say that a source has hidden Markov 
property if and only if for all 1 < i < n, 

P[x i \6i-i 1 Xi-i,x l - 2l —,xi] = P[xi\6i-i]. 

That means the bit generated at time i only depends on the 
complete system information at time i — 1. Apparently, such 
sources are good descriptions of many natural sources for the 
purpose of high-speed random number generation, like those 
based on thermal noise, avalanche noise, etc. 

Example 2. Let us consider a weak random source based on 
thermal noise. By sampling the noise signal, we get a time 
sequence of real numbers: 

ViV2-y n € TZ n . 
For this time sequence it has Markov property, i.e., 

P[yi\vi-i, -,vt] = P[yi\yi-i]- 

By comparing the value at each time with a fixed threshold, 
we get a binary sequence as the source 

X = XiX 2 ...X n £ {0,l} n , 
such that Xi = aga(jji — a) with a constant a for all 1 < i < n. 
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To analyze the performance of linear transformations on 
hidden Markov sources, we assume that the external noise of 
the sources is bounded, hence, we assume that for any three 
time points 1 < i\ < 12 < 13 < n, 

P[x i2 = l\9 n ,9 l3 ] e [- 1 ' ' ' 



Lemma 8. Let X = X\X2...x n be a binary sequence gener- 
ated from a hidden Markov source described above. Let M 
be an n x m random matrix such that each entry of M is 
or 1 with probability i. Then given Y — XM, we have 



L 2 2' 2 2 J 



for a constant e. 



Lemma 7. Let X = x\X2-..x n be a binary sequence gen- 
erated from a hidden Markov source described above. Let 
z — Xi ± + ... + x-i t mod 2 for 1 < i\ < 12 < ... < it < n 
with some t, then we have 



E M [p(Y)] < 



So with a uniform random matrix, one can extract as many 

) n log 2 j 



random bits from a hidden Markov source. 



1 , eC- 1 )/ 2 
|P[*=l]-2l<-2— 



(11) 



Proof: 



\P[z = l}--\ 



And this conclusion is also true for sparse random matrices, 
given by the following theorem. 

Theorem |3j Let X = xiX2--.x n be a binary sequence 
generated from a hidden Markov source described above. Let 
M be an nxm binary random matrix in which the probability 
of each entry being 1 is p — w( 06 " ) < \. Assume Y = XM. 

If — — m 3 < 1, as n becomes large enough, we have that 

n 1o S2 t+ts 



J2 P[e n ,9 l3 ,...]P[z = l\9 ll7 l3 ,...} - -\ P( Y ) converges to in probability, i.e. 



< max |P[xj. 



Given 9,, , 9,,, we have independent of each 

other. So the conclusion is immediate following the statement 
of Lemma Q] ■ 

For some hidden Markov sources, the constraint e is not so 
strict. It is possible that there exists a group of 9i t ,9i 3 ,... such 
that 

I e (*-l)/2 

\P[z = 11^,0*3,-] - 2I > — 2 — * 
In this case, we may find a typical set S such that 
P[(0 ilJ i9) ...)€S]->l, 
as the sequence becomes long enough, and in this typical set, 

I e (t"l)/2 

\P[z = lK^,^, ...) eS}--\< — - — . 

In this case, we can write 

\P[z = l]-~\<P[(9 il ,9 i3 ,-)<fS} 

(Bi 1 ,e i3 ,...)es 2 

where the first term on the righthand side is ignorable. 
Note that Equ. ( fTTT i can be rewritten as 

which is very similar to the result in Lemma Q] If we ignore 
the constant term y/e, the only difference between them is 
replacing e by y/e. Based on this observation as well as the 
results in Section [IV] for independent sources, we can obtain 
the following results for hidden Markov sources. 



p(Y) ^ 0. 

Proof: The proof follows the same idea for the proof of 
Theorem Q] ■ 

Theorem 3J Let C be a linear binary code with dimension 
m and codeword length n. Assume its weight distribution is 
binomial and its generator matrix is G. Let X = x\X2--.x n 
be a binary sequence generated from a hidden Markov source 
described above, and let Y = XG T . If — _ m a — < 1, as 



>. lot 



00, we have 



2 l+v^ 



p(Y) -> 0. 



Proof: The proof follows the same idea for the proof of 
Theorem [2] ■ 

These theorems show that when n is large enough, we can 
extract as many as n log 2 1+ 2 ^ random bits from the a hidden 
Markov source using linear transformations. 

Let us consider an order- 1 Markov source as a special 
instance. Assume that X = x\X2-..x n is a binary sequence 
generated from this source such that each bit x,; € {0, 1} only 
depends on its previous one bit, namely, 



P[a: i = l|a: < _i]G[i-e/2,i+e/2] 



for a constant e. Note that the transition probabilities are 
slightly unpredictable. 

We first show that such a source can be treated as a (hidden) 
Markov source such that for any 1 < < ij < ij+i < n, 



\P[ x ij\ x ij-l 1 x ij + l] r, I — 



-|<- 

2 1 ~ 2 



for a constant e. 
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According to the definition, we have 



P[xi j \x ij - 1 ]...P[x ij _ 1+1 \xi i _ 1 ] - -\ 

< ^ PiXij-llXij^-PiXi^+llXi^,} 



x i ._ 1 + 1 ,...,x ij - 1 



Xi - , +1 ,X4 . _1 



X|P[SiJSi,_i] - -| 



< 



As a result, 



< 



< 



I P[ x i j _ 1 ]P[Xj :j \Xj j _ 1 ]P[Xj j + 1 \Xij] ^ 1. 

i J (l + I) 2 i, 

(3 + l) 2 + (5-f) 2 2 
e 



1 + e 2 
Then, by setting e 



2e 

IT? 



, we can get 



1 e 

|-Ppi 3 - l^ij-i > ^ij+i] ~ 2 I — o 

for all 1 < < ij < < n. 

According to the above theorems, with linear transforma- 



tions, we can extract as many as nlog 2 (- 



bits from the above source asymptotically. In this case 



) random 



nlog 2 (- 



1 



2 2 
= ) < minH min (X) = n\og 2 (— — ). 

2e X l + £ 



That means the linear transformations are not optimal for 
extracting randomness from order- 1 Markov sources. It is true 
for most hidden Markov sources. But we need to see that 
linear transformations have good capabilities of tolerating local 
correlations. The gap between their information efficiency and 
the optimality is reasonable small for hidden Markov sources, 
especially considering their constructive simplicity. In high- 
speed random number generation, the physical sources usually 
have relatively good quality, namely, the bits are roughly 
independent (with a very small amount of correlations). In 
this case, Linear transformation are very efficient in extracting 
randomness. 

VI. Bit-Fixing Sources 

In this section, we consider another type of weak random 
sources, called bit-fixing sources, first studied by Cohen and 
Wigderson [6|. In an oblivious bit- fixing source X of length n, 
k bits in X are unbiased and independent, and the remaining 
n — k bits are fixed. The positions of the k bits are unknown. 
In fact, oblivious bit-fixing sources is a special type of 
independent sources that we studied in the previous sections, 
where all the bits in the source are independent of each other, 
among them, k bits have probability 1/2 and the other n — k 




bits have probability either or 1. So our conclusions about the 
application of sparse random matrices on independent sources 
still can work here. 

Another type of bit-fixing sources are nonoblivious. Unlike 
the oblivious case, in nonoblivious bit-fixing sources, the re- 
maining n—k bits are linearly determined by the k independent 
and unbiased bits. Such sources were originally studied in the 
context of collective coin flipping [4|. 

Generally, we can describe a (nonoblivious) bit-fixing 
source in the following way: Let Z E {0, l} k be an inde- 
pendent and unbiased sequence, the source X E {0, l} n can 
be written as X — ZA, where A is an unknown k x n binary 
matrix such that there are k columns in A that form an identity 
matrix. 

Example 3. One example of such a matrix A is 



A = 



If we consider the columns 2,4,3, then they form an identity 
matrix. 

Given a bit-fixing source with k independent and unbiased 
bits, one cannot extract more than k random bits that are 
arbitrarily close to truly random bits. That's because the 
entropy of the output sequence must be upper bounded by 
the entropy of the input sequence, which is k. 

Lemma 9. Let X = x\X2---x n E {0,1}" be a bit-fixing 
source in which k bits are unbiased and independent. Let M 
be an n x m uniform random matrix such that each entry of 
M is or 1 with probability ^. Given Y = XM, then we 
have 

P M [p(Y)^0] < 2 m - k . 

Proof: For a bit-fixing source X E {0,1}™, we can write 
it as X — ZA, where Z E {(),l} k is an independent and 
unbiased sequence. Hence, 

Y = XM = ZAM = ZB, 

in which B = AM is an k x m matrix. 

We see that all the columns of B are independent of each 
other because the zth column of B only depends on the zth 
column of M for all 1 < i < m. Furthermore, it can be 
proved that each column of B is a vector in which all the 
elements are independent of each other and each element is 
or 1 with probability 1/2. To see this, we consider an entry 
in B, which is Bij — AikM^j. Given this i, according to 
the definition of A, we can always find a column r such that 
only the element in the ith row is 1 and all the other elements 
in this column are 0s. So we can write 

Bij = M ir +V A ih M, 



k^r 



B vj = Y M'kMkj, for iVi, 

k=£r 

where M ir is an unbiased random bit independent of Mjy with 
k 7^ r. In this case, B^ is independent of B. L 'j with i' ^ i. 
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Hence, we can conclude that B is a random matrix in which 
each entry is or 1 with probability 1/2. 

According to Lemma [2] we get that p(Y) — if and only 
if ZBu T is an unbiased random bit for all u/0. 

Hence, 

Pm [p{Y) + 0] < P m \- ZBuT is fixed 1 

= ^2P B [Bu T = 0], (12) 

where Bu T is a random vector with each element being or 
1 with probability 1/2 for all u ^ 0. So 

P B [Bu T = 0] = 2- k . 

Finally, we can get that 

P M [p{Y)^0] <^2- fe <2" l - fe . 

This completes the proof. ■ 

According to the above lemma, by using a uniform random 
matrix with m — k < 0, we can generate an independent 
and unbiased sequence from a bit-fixing source with almost 
probability 1. In the following theorem, we show that sparse 
random matrices can also work for bit-fixing sources. 

Theorem HJ Let X = x\X2---x n € {0,1}™ be a bit-fixing 
source in which k bits are unbiased and independent. Let M 
be an nxm binary random matrix in which the probability for 
each entry being 1 is p — w( °^" ) < \- Assume Y = XM. If 
j < 1, as n becomes large enough, we have that p(Y) = 
with almost probability 1, i.e., 

P M [p(Y) = 0] ->■ 1. 

Proof: According to Equ. ( TT2l . we have 

P M [p(Y)^0] = J2 P ^ AMuT = °}- 

When H / 0, Mu T is a random vector in which all the 
elements are independent of each other. Let \u\ = j, then 
according to Lemma Q] the probability for each element in 
Mu T being 1 is 

P] e[\{i-{i-2py), l -{i + {i-2py)]. 

Let v T — AMu T and use vf denote its zth element, then 

k 

p m [v t = o] = JJ^K T = - o, ...,vU = o]. 

According to the constraint on A, we know that there exists 
a column that is [0, 0, 1, 0, 0] T , in which only the entry 
in the ith row is 1. Without loss of generality, we assume that 
this column is the rth column. Then we can write 

n 

vj = (Mu T ) r + a l t(Mu T ) t , 



where (Mu T ) r is 1 with probability pj € — (1 — 

2p) j ), |(1 + (1 - 2p) j )}, and it is independent of vf ... 
Hence, 

P M [vJ = 0\vj = 0,...,vJL 1 =0] 
i 

= ^P M [(M M T ) r = a] 

n 

XP M [ ait(Mu T ) t = a\vj =0,...,v?_ x = 0] 

< max Pm [(Mu T ) r = a] 

a— 

= i(i + (i-2 P y), 

So when |u| = j, we can get 

P M [AMu T =0] < + 2pY) k . 
As a result, 

Pm[ P (Y) + 0] < £ 7 ( 2 (1 + (1 ~ 2p)3))k - 

j= i v ■> / 

Let us divide it into two parts, 

log j 

3=1 

72= £ (")(l(l + (l-2^)) fc , 

where e is arbitrary small. Then 

P M [p{Y)^0] < 71+72 • 

According to Lemma [6] we can get that the first part 71 — > 
as n — > 0. 

For the second part 72, it is easy to show that for any e > 0, 
when n (or k) is large enough 

-» = E (7)(^(i + (i-2^)) fc 

._ log 7 
J — 2p 

< 2 m - k {l + e) k . 

As a result, if m — k log <§; for an arbitrary e, then 

Pm[p(F) 7^ 0] can be very small. Therefore, we get the 
conclusion in the theorem. 

This completes the proof. ■ 

We see that sparse random matrices are asymptotically 
optimal for extracting randomness from bit-fixing sources. 
Now a question is whether we can find an explicit construction 
of linear transformations for extracting randomness efficiently 
from any bit-fixing source specified by n and k. Unfortunately, 
the answer is negative. The reason is that in order to extract 
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independent random bits, it requires XMu T to be an unbi- 
ased random bit for all u ^ (See the proof above). So 
||Afu T || > n — k for all « / 0, otherwise we are able to 
find a bit-fixing source X such that XMu T is fixed. Such 
a bit-fixing source can be constructed as follows: Assume 
X = x\X2--x n , if {Mu T )i = 1 we set Xi as an unbiased 
random bit, otherwise we set xi = being fixed. It further 
implies that if we have a linear code with generator matrix 
M T , then its minimum distance should be more than n — k. 
But for such a matrix, its efficiency (^) is usually very low. 
For example, when k = §, we have to find a linear code with 
minimum distance more than In this case, the dimension 
of the code, i.e., m, is much smaller than k, implying a low 
efficiency in randomness extraction. 

VII. LlNEAR-SUBSPACE SOURCES 

In this previous section, we studied a bit-fixing source X £ 
{0, 1}™, which can be written as ZA, where Z € {0, l} fc is 
an independent and unbiased sequence and A is an unknown 
k xn matrix that embeds an identity matrix. Actually, we can 
generalize the model of bit-fixing sources in two directions. 
First, the matrix A can be generalized to any full-rank matrix. 
Second, the sequence Z is not necessary being independent 
and unbiased. Instead, it can be any random source described 
in this paper, like an independent source or a hidden Markov 
source. The new generalized source X can be treated as a 
mapping of another source Z into a linear subspace of higher 
dimensions, so we call it a linear-subspace source. The rows 
of the matrix A, which are independent of each other, form 
the basis of the linear subspace. Linear-subspace sources are 
good descriptions of many natural sources, like sparse images 
studied in compressive sensing. 

First, let us consider the case that the matrix A is an arbi- 
trary unknown full rank matrix and Z is still an independent 
and unbiased sequence. 

Lemma 10. Let X = x\Xi--.x n € {0, 1}™ be a source such 
that X = ZA in which Z is an independent and unbiased 
sequence, and A is an unknown k x n full-rank matrix. Let 
M be an n x m random matrix such that each entry of M is 
1 with probability p = wC-^) < \. Assume Y = XM. If 
^ < 1, as n becomes large enough, we have p(Y) = with 
almost probability 1, i.e., 

P M [p(Y)=Q] -► 1. 

Proof: In the proof of Theorem [5] we have 

P M [p(Y)^0] = ^P M [AMu T = 0}. 

If the matrix A has full rank, than we can write 

A = UR, 

where det(U) ^ and R is in row echelon form. We see that 
RZ is a nonoblivious bit-fixing source. 

Since det(U) ^ 0, AMu T = is equivalent to RMu T = 0. 
Therefore, 

Pm[p(Y)^0] = ^P M [i?M/ = 0]. 



Based on the proof of Theorem|5] we can get the conclusion 
in the lemma. 

This completes the proof. ■ 

Furthermore, we generalize the sequence Z to a general 
independent source in which the probability of each bit is 
unknown and the min-entropy of the source is H m i n (Z). 

Theorem |6j Let X = x\Xi--x n € {0, 1}" be a source such 
that X = ZA in which Z is an independent sequence and A 
is an unknown k x n full-rank matrix. Let M be an n x m 
random matrix such that each entry of M is 1 with probability 
p = w( 1 -^-) < \. Assume Y = XM. If Hn ™ {x) < l as 
n-)(X), p{Y) converges to in probability, i.e., 

p(Y) 0. 

Proof: Let Si be the bias of Zi in Z for all 1 < i < k. 
According to Equ. ([Tol l, we can get 

1 k 
Em[ P (Y)}<-^2 E Pm[AMu t = v T ]\{(25 i r. 

u^O v£{0,l} k i=l 

When ||u|| = j, Mu T is an independent sequence in which 
each bit is one with probability 

Pie[i(l-(i-2p)'),i(i + (i-2p)*)]. 

In Theorem [10] we have proved that 

P M [AMu T = 0]<(i(l + (l-2 P y) fe . 

Using a same idea, if A — UR with det(U) ^ and R in 
row echelon form, we can write 

P M [AMu T = v T ] 
= P M [RMu T = U~ 1 v T ] 

k 

= [] J P M [(i?Af U T ) l = (C/- 1 ? ; T ) l 

i=l 

KM/),^ = (EZ-VOi-i,...] 
< { l -{l + {l-2pY) k 

for all v T E {0, 1}\ 
Hence 




In the next step, following the proof of Theorem [JJ we can 
get that if H m , Z \ < 1, as n — > oo, 



E M [p(Y)]^0. 

It is equivalent to p(Y) A- 0. 

Since H m i n (Z) = H min (X), we can get the conclusion in 
the theorem. 

This completes the proof. ■ 

A similar result holds if Z is a hidden Markov sequence. 
In this case, we have the following theorem. 
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Theorem 7. Let X = x\Xi...x n € {0, 1}" be a source such 
that X — ZA in which Z € {0, l} fc is a hidden Markov 
sequence described in Section and A is an unknown k x n 
full-rank matrix. Let M be an n x m random matrix such 
that each entry of M is 1 with probability p = w( og " ) < i. 
Assume Y = XM. If ■=-. — m , < 1, as n -t oo, p(Y) 

fc 1o S2 1 + 

converges to in probability, i.e., 

p(Y) 4 0. 

From the above theorems, we see that by multiplying 
an invertible matrix to a given source does not affect the 
extracting capability of sparse random matrices. 

VIII. Implementation for High-Speed Applications 

In this section, we discuss the implementation of linear 
transformations in high-speed random number generators, 
where the physical sources usually provide a stream rather 
than a sequence of finite length. To generate random bits, we 
can apply a linear transformation to the incoming stream based 
on block by block, namely, we divide the incoming stream into 
blocks and generate random bits from each block separately. 
Such an operation can be finished by software or hardware 
like FPGAs ll9l, |32l. 

Another way is that we process each bit when it arrives. 
In this case, let M = {rriij} be an n x m matrix (such as 
a sparse random matrix) for processing the incoming stream 
and let V € {0, l} m denote a vector that stores m bits. The 
vector V is updated dynamically in response of the incoming 
bits. When the ith bit of the stream, denoted by Xi, arrives we 
do the following operation on V, 

V -t V + XiM 1+[i mod „), 

where Mj is the jth row in the matrix M. Specifically, we can 
write the vector V at time i as V[i] and denote its jth element 
as Vj [i] . To generate (almost) random bits, we output the bits 
in V sequentially and cyclically with a lower rate than that of 
the incoming stream. Namely, we generate an output stream 
Y = 2/12/2 ••• such that 

Til 

Vi = Vj. + u mod m )[n + |_— J]. 

So the rate of the output stream is — of the incoming stream. 
In this method, the expected computational time for processing 
a single incoming bit is proportional to the number of ones in 
M over n. According to our results of sparse random matrices, 
it can be as low as (log n) a with any a > 1 asymptotically. So 
this method is computationally very efficient, and the working 
load is well balanced. 

IX. Conclusion 

In this paper, we demonstrated the power of linear trans- 
formations in randomness extraction from a few types of 
weak random sources, including independent sources, hid- 
den Markov sources, bit-fixing sources, and linear-subspace 
sources, as summarized in Table U Compared to the existing 
methods, the constructions of linear transformations are much 
simpler, and they can be easily implemented using FPGAs; 



these properties make methods based on linear transforma- 
tions very practical. To reduce the hardware/computational 
complexity, we prefer sparse matrices rather than high-density 
matrices, and we proved that sparse random matrices can work 
as well as uniform random matrices. Explicit constructions of 
efficient sparse matrices remain a topic for future research. 
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;p Df™ -1 1 >— r 1 e 1 1 ei 

it Pixj - lj e [---,- + -j 


Hidden Markov 
Sources 


™ l0 S2 

if P[x i2 = 11^,^3] e[i-|,i + |] 


itp[x i2 = i\e il ,e ia ] 6[i-§,i + §] 


Bit-Fixing 
Sources 




NA 
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Sources 


if X = AZ with A full-rank and Z independent 


NA 
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